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Abstract 



We propose a framework for compressive sensing of images with local distinguishable objects, 
such as stars, and apply it to solve a problem in celestial navigation. Specifically, let x € M. N be 
an TV-pixel image, consisting of a small number of local distinguishable objects plus noise. Our 
goal is to design anmxJV measurement matrix A with m <C N, such that we can recover an 
approximation to x from the measurements Ax. 

We construct a matrix A and recovery algorithm with the following properties: (i) if there are 
k objects, the number of measurements m is 0((fclog iV)/(logfc)), undercutting the best known 
bound of 0(k\og(N/k)) (ii) the matrix A is very sparse, which is important for hardware im- 
plementations of compressive sensing algorithms, and (iii) the recovery algorithm is empirically 
fast and runs in time polynomial in k and log(iV). 

We also present a comprehensive study of the application of our algorithm to attitude de- 
termination, or finding one's orientation in space. Spacecraft typically use cameras to acquire 
an image of the sky, and then identify stars in the image to compute their orientation. Taking 
pictures is very expensive for small spacecraft, since camera sensors use a lot of power. Our 
algorithm optically compresses the image before it reaches the camera's array of pixels, reducing 
the number of sensors that are required. 

Keywords: Compressive sensing; Sparse matrices; Attitude determination; Star tracking. 



1 Introduction 

1.1 Compressive sensing 

Traditional approaches to image acquisition first capture an entire iV-pixel image and then process 
it for compression, transmission, or storage. Often, the image is captured at very high fidelity, 
only to be immediately compressed after digitization. In contrast, compressive sensing uses prior 
knowledge about a signal to obtain a compressed representation directly, by acquiring a small 
number of nonadaptive linear measurements of the signal in hardware [CRT06, Don06]. Formally, 
for an image represented by a vector x, we acquire the measurement vector, or sketch, Ax, where A 
is an m x N matrix. The advantage of this architecture is that it uses fewer sensors, and therefore 
can be cheaper and use less energy than a conventional camera [DDT + 08, FTF06, Rom09]. 

In order to reconstruct the image x from the lower-dimensional sketch Ax, we assume that the image 
x is k-sparse for some k (i.e., has at most k non-zero coordinates) or at least be well-approximated 
by a /c-sparse vector. Then, given Ax, one finds an approximation to x by performing sparse 
recovery. The problem is typically defined as follows: construct a matrix A such that, for any 
signal x, we can recover a vector x from Ax satisfying 





where Err^x) = minfc. S p arse x > \\x — x'\\ 1 and C is the approximation factor. Note that if x is 
fc-sparse, then Err^(x) = 0, and therefore x = x. Sparse recovery also has applications to other 
areas, such as data stream computing [Mut05, Ind07]. 

The problem of designing matrices A and corresponding recovery algorithms has been a subject 
of extensive study over the last few years, with the goal of designing schemes that enjoy good 
compression rate (i.e., low values of m) as well as good algorithmic properties such as low encoding 
complexity and quick recovery times. Low encoding complexity is often achieved by using matrices 
that are binary (entries chosen from {0, 1} or { — 1, 1}), or that have low column sparsity. Column 
sparsity is the average number of non-zero entries per column, namely the average number of 
buckets into which each coordinate of the signal gets split. It is known by now that there exist 
binary matrices A and associated recovery algorithms that produce approximations x satisfying 
Eq. 1 with constant approximation factor C and sketch length m = 0(klog(N/k)). In particular, 
a random Bernoulli matrix [CRT06] or a random binary matrix with column sparsity 0(\og(N/k)) 
[BGI + 08] has this property with overwhelming probability. It is also known that this sketch length 
is asymptotically optimal [DIPW10, FPRU10]. See [GI10] for an overview of compressive sensing 
using matrices with low column sparsity, along with [BI09] for a newer algorithm that we run 
experiments against (Section 3.4). 

1.2 Attitude determination 

Spacecraft determine their attitude, or 3-axis orientation, by taking pictures of the sky ahead of 
them and identifying stars in the image. This function is encapsulated in a star tracker, which is 
essentially a digital camera connected to a processor. To acquire the initial attitude, the camera 

1. Takes a picture of the sky. 

2. Identifies a set of starlike objects in the picture, and computes the centroid of each object. 

3. Tries to match triangles and quadrilaterals formed by subsets of the centroids to an onboard 
database. A match provides approximate attitude information. 

4. Uses the onboard database to determine a set of stars that it expects are in the picture, along 
with their approximate locations. Refines the attitude information by centroiding those stars 
as well. 

Most of the time, a star tracker has knowledge about its approximate attitude, either from the 
previous attitude computation or from other sensors. In that case, it goes from Step 1 directly to 
Step 4, in what we call tracking mode. [Lie02] has an overview of the process. 

There are two types of sensors used in star tracker cameras, CCD (charge-coupled device), and 
CMOS (complimentary metal-oxide semiconductor). CCD sensors have low noise and capture 
a high fraction of the incoming signal, but have power usage and manufacturing costs that are 
super-linear in the number of pixels and that are high in absolute terms. On the other hand, 
CMOS sensors use little power, are cheap to manufacture, and allow random access to pixel values 
(important for tracking mode), but capture less of the incoming signal, and are very noisy. Most 
spacecraft use CCD cameras, but smaller or cheaper spacecraft use CMOS, taking a factor of 10 
or higher hit in precision. 
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1.3 Motivation for a new algorithm 



Ideally, we would like to have a camera with the precision of a high pixel CCD camera, but without 
the extra power and manufacturing costs that drive small spacecraft to use CMOS. The pictures 
taken by star trackers are naturally very sparse, in that most pixels are either empty or contain 
small stars not used in Steps 3 or 4. Also, the algorithms for Step 3 are very robust, and can tolerate 
a substantial fraction of bogus centroid information [Mor97] . A compressive sensing solution would 
optically compress the incoming picture, to reduce the size of the CCD array. 

However, a standard compressive sensing solution runs into several major problems. First, the L\ 
mass of the small stars is large compared to the L\ mass of the signal. In other words, Errj^x) = 
0(||x||i), and so Eq. 1 (Section 1.1) gives no guarantee at all. Second, the signal to noise ratio on 
star trackers is already low, and each non-zero entry of the sensing matrix A will add substantial 
signal-independent noise to the final measurement. [HL07] Finally, each star is spread over multiple 
pixels, and makes only a small contribution to some of them. These pixels are needed to find the 
centroid of the star properly, but compressive sensing recovery techniques are only designed to 
recover the biggest pixels well. 

We address many of these concerns by focusing on a compressive sensing algorithm where A is very 
sparse. Compressive sensing algorithms with sparse measurement matrices are of general interest 
as well. Potential advantages in non-star tracking electronic compressive imagers include reduced 
interconnect complexity [Mei03], low memory requirements for storing the measurement matrix, 
and gain in image acquisition speed due to reduced operations. 

Unfortunately, it is known [NaclO] that any deterministic scheme with guarantee as in Eq. 1 requires 
column sparsity of Q(\og(N/k)). In the randomized case, where A is a random variable, and Eq. 1 
is required to hold only with constant probability over the choice of A, the same paper shows that 
any binary matrix A must have column sparsity as stated. 

In this paper we overcome the above limitations by employing a two- fold approach. First, we 
consider a class of images where the k large coefficients or k local objects can be distinguished from 
each other. Second, we relax the recovery guarantee, by requiring that only a constant fraction of 
the objects are recovered correctly, and only with constant probability. 

1.4 Model description 

Our model for sparse images is motivated by astronomical imaging, where an image contains a 
small number of distinguishable objects (e.g., stars) plus some noise. We model each object as an 
image contained in a small w x w bounding box, for some w = 0(1). The image is constructed by 
placing k objects in the image in an arbitrary fashion, subject to a minimum separation constraint. 
The image is then modified by adding noise. We formalize the notions of minimum separation 
constraint, distinguishability, and noise in the rest of this section. Some of the definitions below 
are illustrated in Appendix A. 

Let x be an A^-dimensional real vector, and assume N = n 2 for an integer n. We will treat x both 
as a vector and as an n x n matrix, with entries x[i,j] for i,j £ [n\. An object o is a w x w real 
matrix. Let O = {oi, . . . , o/J be a sequence of k objects, and let T = {t\, . . . , t^} be a sequence 
of translations in x, i.e., elements from [n — w] 2 . We say that T is valid if for any i / j the 
translations tj and tj do not collide, i.e., we have \\ti — ij||oo > w' for some constant separation 
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parameter w' = £l(w). For o 6 O and t = (t x ,t y ) £ T, we define t(o) to be a w x u; matrix indexed 
by {t Xl . . . , t x + w — 1} x . . . , t y + w — 1}. Using somewhat sloppy notation, the ground truth 
image is then defined as x = 

During our algorithm, we impose a grid G on the image with cells of size w' x w' . Let x c be the 
image (i.e., an ?i/ 2 -dimensional vector) corresponding to cell c. We then use a projection F that 
maps each sub-image x c into a feature vector F{x c ). If y C x c for some cell c and some set of pixels 
y, we use F(y) to denote F(x c ) after the entries of x c \ y are set to 0. If y is not a cell and not 
contained in a cell, we leave F(y) undefined. 

The distinguishability property we assume is that for any two distinct o, o' from the objects C ) U{0}, 
and for any two translations t and t' , we have \\F(t(o)) — F(t'(o'))\\r > T (when it is defined) for 
some threshold T > and some norm || • ||r- In other words, different objects need to look different 
under F. For concreteness, the features we exploit in the experimental section are the magnitude 
(the sum of all pixels in the cell) and centroid (the sum of all pixels in the cell, weighted by pixel 
coordinates), since the magnitudes of stars follow a power law, and the centroid of a star can be 
resolved to .15 times the width of a pixel in each dimension (Section 3.2). The distinguishability 
constraint is what ultimately allows us to undercut the usual lower bound by a factor of log k. 

The observed image x' is equal to x + /i, where fi is a noise vector. The threshold T determines the 
total amount of noise that the algorithm tolerates. Specifically, let \\/j,\\f = S c II ^it^clWr, where /i c 
is the noise corresponding to cell c. We assume that < jkT for some small constant 7 > 0, 

and make no other assumptions about the noise. 

1.5 Results and techniques 

1.5.1 Theoretical result 

Assume sparsity parameter k > ClogN for some constant C, and prior knowledge of k and the 
distinguishability parameter T. We construct a distribution over random binary mx N matrices A, 
such that given Ax' for x' described above, we recover, with constant probability, a set D oik cells, 
such that at least k/2 of the cells fully containing an object are included in D 1 . The matrix has 
column sparsity 0(log fc N), and has m = 0(fclog fe N) rows. Note that we trade off column sparsity 
and compression. If (say) k = iV 1 / 2 , then the column sparsity is constant, and m = 0{N 1 / 2 ). The 
running time for the recovery procedure is 0(fc 3 log| n). 

1.5.2 Empirical result 

We implement a standard attitude determination routine, with the picture acquisition step replaced 
with a simplified version of the theoretical algorithm. Our algorithm performs better recovery on 
small numbers of measurements and is orders of magnitude faster than comparable compressive 
sensing methods. 

1 From this, a simple min or median process can be used to recover an approximation to x c for any c € D. See 
Section II. A of [GI10] for an explanation of the technique. 
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1.5.3 Our techniques 

Our construction of the measurement matrix resembles those of other algorithms for sparse matrices, 
such as Count-Sketch [CCF04] or Count-Min [CM05]: we "hash" each cell c into each of s = 
0(\og k N) arrays of q = O(k) "buckets", and sum all the cells hashed to the same bucket. Each 
bucket defines one measurement of w' 2 pixels, which gives m = 0(k\og k N). Hashing is done using 
either the Reed-Solomon code or the Chinese Remainder code 2 . 

The recovery process is based on the following novel approach. For simplicity, assume for now that 
the image contains no noise, and ignore the effect of two different objects being hashed to the same 
bucket. In this case, all buckets containing distinct objects are distinguishable from each other. 
Therefore, we can group non-empty buckets into k clusters of size s, with each cluster containing 
buckets with a single value. Since q s > N, each cluster of buckets uniquely determines the cell in 
x containing the object in those buckets. 

In order to make this approach practical, however, we need to make it robust to errors. The errors 
are due to distinct objects being hashed to the same bucket, the noise vector /j,, and the grid 
cutting objects into pieces. Because of these issues, the clustering procedure aims to find clusters 
containing elements that are close to each other, rather than equal, and the procedure allows for 
some small fraction of outliers [CKMN01]. For this purpose, we use the approximation algorithm 
for the ^-center problem with outliers, which correctly clusters a constant fraction of the buckets. 
To handle the buckets that are grouped incorrectly, we construct our hash function using a constant 
rate error-correcting code [GurlO]. 

2 Theoretical Results 

A graphical representation of the algorithm is presented in Appendix A. Our scheme works by 
"hashing" each cell c into s = 0(log fc N) different arrays of size O(k). We can think of this as a 
mapping g from [N] to [0(k)] s . As long as each character of the mapping is approximately pairwise 
independent, then (in expectation) most of the k objects will be alone in most of the array locations 
they map to. Our reconstruction algorithm clusters the values in the cells, giving us a noisy version 
y' of the true codeword y = g{c) with a constant fraction of errors. We then efficiently decode from 
y' to c. 

The first and second sections below establish families Q from which we will draw mappings g. The 
third uses Q to construct a distribution over measurement matrices A, and the fourth presents an 
associated recovery algorithm. The main result is stated in Theorem 10. 

2.1 Definitions and preliminaries 

We need an efficient error correcting code that is also approximately pairwise independent in each 
character. This section gives precise definitions of our requirements, and the next section gives two 
codes that achieve them. 

Definition 1. A hash family % of functions h: A — > B is pairwise-independent if, for any x±,x 2 G 
A and yi,y 2 € B with x\ / x 2 , we have ¥x heH [h(xi) = y 1 n h(x 2 ) = y 2 ] = j^p. 

2 Note that our use of the Chinese Remainder code does not incur any additional polylogarithmic factors. 
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For any prime P > N, the function family "Hp : ax+b (mod P) for a,b E [P] is pairwise independent 
when viewed as a set of functions from [N] to [P] . 

In many of our applications the range B is the product of s "symbols" B\ x ■ ■ ■ x B s . For a function 
f : A —¥ B and i 6 [s], we use /i(x) to denote the ith coordinate of /. When B is a product space, 
we will sometimes settle for a weaker notion of pairwise independence. Rather than requiring 
pairwise independence for the whole range, we only require approximate pairwise independence in 
each coordinate: 

Definition 2. Let B = B\ x • • • x B s . A hash family Ti of functions h: A — > B is coordinate- 
wise C-pairwise-independent if, for all i G [s], any x\ / x 2 G A, and all 2/1,2/2 E Bi, we have 
Vx heH [hi(xi) = yi n hi(x 2 ) = 2/2] < j^pj- 

Definition 3. Let B = B\ x ■ ■ • x B s . A function f : A — >■ B is C-uniform if, for all i G [s] and 

all y G B, h Pr xeA [fi(x) = y] < ^ . 

For any function / : B — > D and family % of functions h: A — > B,we use / 0% to denote the family 
of A -> D functions {g(x) := f(h(x)) \ h G Ti}. 

Claim 1. If 1-1 is pairwise-independent and f is C-uniform, then f oTi is coordinatewise C 2 - 
pairwise-independent. 

Proof. Let Ti be a family of functions A — > B and let /: B — > D = D± x ■ ■ ■ x D s . Then for any 
i G [s], any x\ 7^ X2 G A, and all 2/1,2/2 G -Dj we have: 

Pi[f i (h(x 1 )) = y 1 nf i (h(x 2 )) = y 2 ] 
= Yl P T J h (xi) = n h{x 2 ) = z 2 n f^) = yi n fi(z 2 ) = y 2 ] 



Y ID|2 



Pr [/iW = yin/i(«j)=ife] 



= D [/*(«i) = yi n /i(z 2 ) = 2/2] 

= Pr R [/ i (z 1 ) = yi] Fr [f l (z 2 ) = y 2 ] 

c 2 

< 



BA 2 



as desired. □ 

Definition 4. We saj/ i/iat a function f : A ^ B for B = B\ x ■ ■ ■ x B s is an error-correcting code 
of distance d if, for any two distinct x\,x 2 G A, f{x\) and f{x 2 ) differ in at least d coordinates. 
We say that f is efficiently decodable if we have an algorithm f" 1 running in log *- 1 ) \B\ time with 
f l {v) = x f or an y x G A and y G B such that f(x) and y differ in fewer than d/2 coordinates. 

Recall the hash family Tip : ax + b (mod P) of functions [N] — > [P] . 

Claim 2. /// is an efficiently decodable error- correcting code with distance d, then so is f o h for 
every h G Tip with a / P. 

Proof. Since a 7^ P, there exists an a -1 modulo P, and we can efficiently compute it. Hence 
h is injective, so / o h is an error-correcting code of distance d. Furthermore, (/ o h)~ l (x) = 
a~ l (f~ l (x) — b) (mod P) is efficiently computable. □ 
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Definition 5. We say that a family Q of functions g: A — > B\ x • • • x B s is an (N, s, d) q -independent 
code if Q is coordinatewise J^-pairwise independent, q < \B{\ < 2q for all i £ [s], \A\ > N, and with 
probability at least 1 — 1/N over g G Q we have that g is efficiently decodable with distance d. 

Claims 1 and 2 give the following lemma: 

Lemma 3. If f : [P] — > B\ x ■ ■ ■ x B s is 2-uniform and efficiently decodable with distance d, and 
Q < \Bi\ < 2q for all i, then / o "Hp is a (N, s, d) q -independent code. 

We now show that (iV, s, <i) g -independent codes have few collisions in expectation. 

Lemma 4. Suppose g: A — > B\ x . . . x B s is drawn from a (TV, s, d) q -independent code. Let 
S,S' C A. Define the set of "colliding" symbols 

X = {(a, i) | a £ S,i £ [s], 3a £ S' s.t. gi(a) = gi(a'), a / a}. 
With probability at least 7/8, \X\ < 32 \S\ \S'\ s/q. 
Proof. We observe that 

E [|*|] = EE Pr [(«'*) G *] 

i£[s] aeS 

<EEE P *W=ffi(a')] 

ie[s] aeSa'eS' 

= EEEE Pr ^(«) = z n = a 

i<E{s] aeSa'eS' zeB, 

^EEE 

ie[s]aeSa'£S' z&Bi ^ 

< s\S\ \S'\4/q. 

Hence, by Markov's inequality, \X\ < 32 \S\ \S'\ s/q with probability at least 7/8. □ 
2.2 Two code constructions 

We explicitly give two (N,s,s — r ^-independent codes. Both are achievable for any parameters 
with 2N < q r and s < q/logq (and the first code allows any s < q). We let P be a prime in 

2.2.1 Reed-Solomon code 

Let q > s. The Reed-Solomon code fjis'- [q r ] — > [o\ s is defined for f(x) by (i) interpreting x 
as an element of IFT, (ii) defining \x G to be the degree r — 1 polynomial with coefficients 

corresponding to x, and (iii) outputting f(x) = (xx(^), ■ ■ ■ , Xx(s))- It is well known to have distance 
s — r and to be efficiently decodable [Jus76]. 

Claim 5. Let f : [P] — >■ [q] s be the restriction of fas to [P]. Then f is 2-uniform, so Qrs = f °T~ip 
is a (N, s,s — r) q -independent code. 
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Proof. Basic facts about polynomials give that fas is 1-uniform. Since P > q r /2, / is 2-uniform. 
Lemma 3 then gives the result. □ 



2.2.2 Chinese remainder theorem (CRT) code 

Let pi,...,p s € [q, 2q] be distinct primes; note that the asymptotic distribution of prime numbers 
implies q/logq = £l(s). Hence for any x £ [N], any r of the residues mod p±,...,p s uniquely 
identify x. The CRT code fcRT '■ [P] — > [pi] x . . . x \p s ] is defined by taking the residues modulo 
each prime. It has distance s — r and is efficiently decodable [GRSOO]. 

Claim 6. The CRT code Jcrt is 2-uniform. Hence Qcrt = fcRT ° Hp is a (N,s,s — r) q - 
independent code. 

Proof. Let i £ [s]. The projection of fcRT{x) onto its ith coordinate is x mod pi. Hence over the 
domain [P] , the ratio between the likelihood of the most common and the least common values in 
the range is jp/p'] < 2. Thus fcRT is 2-uniform, and Lemma 3 gives the result. □ 

2.3 The measurement matrix 

In this section we present the measurement matrix A. A graphical representation of the measure- 
ment process is presented on the first page of Appendix A. Let O = {oi, . . . , Ok} be a sequence of 
k features, and let T = {ii, . . . , t^} be a sequence of (non-colliding) translations in x. Let fi be the 
noise vector, and let x' be the noisy image. Finally, let a,/3,d,rf > be (small) constants whose 
values will be determined in the course of the analysis. 

At the beginning, we impose a square grid G with w' x w' cells on the image x' , such that w' = w/a. 
The grid is shifted by a vector v chosen uniformly at random from [w'] 2 . Let S' be the set of cells 
that intersect or contain some object ij(oj), and S C S' be the set of cells that fully contain 
some object U(oi). Observe that a fixed object is fully contained in some cell with probability 
(1 — w/w') 2 > 1 — 2a, since each axis of the grid intersects the object with probability w/w'. This 
implies that the expected number of cells in S' — S is at most 2ak, and by Markov's inequality 
\S' — S\ < 16ak with probability 7/8. From now on, we will assume the latter event holds. Let 
k' = \S'\. We choose a > such that k! < 2k. 

Our measurement matrix A is defined by the following linear mapping. Let G denote the set 
of cells. Let g: G — > B = B\ x • • • x B s be drawn from a (TV, s, 4(36 + /?)s) g -independent code 
(such as either Qrs or Qcrt)- Moreover, we require that k/q < ??; such a code is achievable 
per Section 2.2 with s = 0(log fc iV) as long as k > ClogN for some constant C (such that both 
9 (i-4(35+/3))s > > 2N an d s < log N/ log k < g/loggj. For each i = l,...,s, we define a 
| Bi | -dimensional vector z % whose entries are elements in K m , such that for any j 

Zj = x c- 

9i(c)=j 

That is, we "hash" all cells into \Bi\ > q buckets, and sum all cells hashed to the same bucket. The 
measurement vector z = Ax 1 is now equal to a concatenation of vectors z 1 , . . . , z s . Note that the 
dimension of z is equal to m = w' 2 Y^, \Bi\ = 0(qs) = Q(k\og k N). 
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2.4 Recovery algorithm 



A graphical representation of the recovery process is presented on the second page of Appendix A. 
The recovery algorithm starts by identifying the buckets that likely contain the cells from S, and 
labels them consistently (i.e., two buckets containing cells from S should receive the same label), 
allowing for a small fraction of errors. We then use the labels to identify the cells. 

The algorithm runs as follows. For a set X C [s] x [2q] of pairs of indices, let F(X) denote 
{F(zj):(i,j)eX}. 

1. Identify R = : \\F(zj)\\r > T/2} (that is, R contains the "heavy cells" of the measure- 
ment vector z). 

2. Partition R into sets R\R l , . . . ,R k such that \R'\ < 5sk, and such that for each 1 < I < k 
the diameter of F(R l ) is at most T/2. 

3. For each label I = 1, . . . , k, create a vector u l G B such that for each i = 1, . . . , s, u\ = j 
if (i,j) G R l (if there are many such j, ties are broken arbitrarily), or u\ =_L (an arbitrary 
erasure symbol) if no such j exists. 

4. For each label / = l,...,k apply the decoding algorithm 3 for g to u l , obtaining a (possibly 
invalid) decoded cell d l . 

We analyze the algorithm by keeping track of the errors at each step. 

Step 1 For any cell c G S and i = 1, . . . , s, we say that i preserves c if \\F(z l g ^) — i^(x c )||r < T/24 
and gi{c') ^ gi{c) for all other d G S. That is, there is no collision from the hashing process, and 
the total amount of distortion due to the noise /j, is small. Let P = {(i, gi(c)) : i preserves c}. Note 
that P C R. We show that P is large and that most of R is in P. 

Lemma 7. With probability at least 7/8, 

\P\ > (1 - P)sk. 

Proof. Consider any pair (c, i) G S x {1, . . . , s}, and let j = gi(c). If i does not preserve c, it must 
be because either (i) there is another cell c' G S' , c' ^ c such that giid) = j, or because (ii) the 
total noise affecting Zj, equal to F^fij) < 5^ gi ( c )=j Fif^c), has norm at least T/24. 

By Lemma 4 with probability at least 7/8 the number of pairs affected by (i) is at most 32ks\S'\/q. 
The event (ii) is determined by the noise vector /x. However, for each i, there are at most 
^ c ^y^ t c ^ r = < 247A; additional cells c G S that are not preserved under i due to this 

reason, where the latter inequality follows from the assumption that \\fJ,\\ F < jkT (Section 1.4). 

Altogether, the total number of pairs (c, i) such that c is not preserved by i is at most 

32sk \S'\ /q + 24 7 sfe < [32r/(l + 16a) + 24 7 ]sfc = f3sk 
for some small constant /3, as desired. □ 



technically, we replace each _L in u l with an arbitrary j before running the decoding algorithm, since the decoding 
algorithms don't know about _L. 
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Lemma 8. With probability at least 3/4, 

\R\P\ < 5sk. 

Proof. Any element (i, j) of R\P ("heavy but not preserved") must belong to one of the following 
three categories: 

1- J = 9i( c ) ior c £ S such that c is not preserved by i. By the previous lemma, there are at 
most f5sk such pairs (c, i) with probability at least 7/8. 

2- j = gi{c) for some cell c G S' \S. There are at most 16ask such pairs (c, i), with probability 
at least 7/8. 

3. The vector T(/x*) = Yl g ( c )=j ^(f^c) has norm at least T/2. There are at most 2jsk such pairs 

This implies that with probability at least 3/4 the total number of pairs (i, j) G R \ P is at most 

(/3 + 16a + 2-f)sk = 5sk 

for some small constant S, as desired. □ 

Step 2 Observe that the elements of F(P) can be clustered into k clusters of diameter T/12. 
Thus, by the previous lemma, there is a fc-clustering of all but 5sk elements of F(R) such that 
the diameter of each cluster is at most T/12. We now apply a 6-approximation algorithm for this 
problem, finding a fc-clustering of F(R) such that the diameter of each cluster is at most T/2. Such 
an approximation algorithm follows immediately from the 3-approximation algorithm for fc-center 
with outliers in [CKMN01], which gives a fc-clustering with radius at most T/4 and hence diameter 
at most T/2. 

Step 3 Consider cells c, c' £ S such that c is preserved by i and d is preserved by i! . If F(z l g .( c \) 

and F(z i> , n ) belong to the same cluster, then it must be the case that c = c', since otherwise the 

distance between them would be at least T — 2T/24 > T/2. In other words, for each I, if u l C PCiR 1 
contains at least one element of P, then all the elements of u l are "derived" from the same cell. 

Lemma 9. With probability at least 3/4, v},...,u k contain a total of at most 25sk errors and 
(5 + (3)sk erasures (i,l such that u\ =-L). 

Proof Let R" = R \ R' = R 1 U • • • U R k . Let P' = P n R' , and P" = P n R". 

Note that \P'\ < \R'\ < 5sk. Each error in u 1 , . . . , u k corresponds to a unique element of R" \ P", 
and we have 

\R" \ P"\ < \R" \P\ + \P\ P"\ <\R\P\ + \P'\ < Ssk + Ssk = 25sk. 

Additionally, {u 1 , . . . ,u k } contains at least P" elements total, and so the number of erasures is at 
most sk — \P"\ = sk — \P\ + \P'\ < (3sk + 5sk, where we use \P\ > (1 — (3)sk from Lemma 7. □ 

Step 4 We can replace erasures by errors, and conclude that u 1 , . . . , u k have a total of at most 
(35 + f3)sk errors. It follows that at least k/2 of them have at most 2(3<5 + /3)s errors, and therefore 
can be decoded. Therefore, the set D = {al 1 , . . . , al k } contains at least k/2 elements of S. 

The running time of the recovery algorithm is dominated by Step 2, where we approximate k- median 
with outliers via the method in [CKMN01]. This takes 0{(ks) 3 ) = 0(k 3 log 3 k n) time. 
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Theorem 10. Assume k > ClogN for some constant C, a signal x with k objects, and a noise 
vector fi, all subject to the constraints delineated in the Model description of Section 1. There is a 
distribution over random binary mx N matrices A, m = 0(klog k N), and an associated recovery 
algorithm with the following property. Suppose that the algorithm is given Ax' for x' = x + fi. Then 
the algorithm recovers (with probability at least 3/4j a set D of k cells, such that at least k/2 of 
the cells fully containing an object are included in D. Moreover, the algorithm runs in 0(k 3 log|n) 
time and the matrix has column sparsity 0(log fc N). 

3 Applications to Attitude Determination 

Star trackers determine their attitude, or 3-axis orientation, by taking pictures of the sky and 
identifying stars in the image. We provide a detailed review of the current technology in the first 
section below. We then present a compressive sensing algorithm for attitude determination, along 
with a discussion about hardware implementation. Finally, we present results from a software 
simulation of the algorithm. 

3.1 Current star trackers 

A star tracker is essentially a digital camera, called a star camera, connected to a microprocessor. 
We describe various characteristics of the camera hardware and star identification algorithms. 

3.1.1 Numbers 

We first provide some numbers from [Lie02] and [WL99] to give a sense of scale. As of 2001, a 
typical CCD star tracker consumes 5-15W of power. A small spacecraft uses 200W of power, and 
a minimal one uses less than 100W, so this can be a substantial amount. 

A high-end star tracker can resolve approximately the same set of stars that an unaided human 
can on a moonless night away from all light pollution. The number of stars in a star tracker's 
database varies from 58 to many thousands. The camera's field of view can vary from 2x2 degrees 
to 30 x 30 degrees, or anywhere from .01% to 4% of the sky. For comparison, the full moon is 
about .5 degrees across, and an adult fist held at arm's length is about 10 degrees across. A CCD 
camera can have up to a million pixels, and the accuracy of the final attitude is usually around .001 
degrees (1 standard deviation), compared to .01 degrees for the next best sensors. The attitude is 
updated anywhere from 0.5 to 10 times a second. 

3.1.2 CCD and CMOS 

CCD (charge-coupled device), and CMOS/APS (complimentary metal-oxide semiconductor /active 
pixel sensor) are the two different types of sensors used in star cameras. We abuse notation and 
use CCD/CMOS to refer to the sensor, annxn array of sensors, the architecture of this array, and 
the star cameras they are a part of. Both CCD and CMOS turn incoming photons into electrons 
using the photoelectric effect, read the electrons as a single voltage or charge, and then digitize the 
charge with an analog to digital converter (ADC). 
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The CCD has a single ADC located in the corner of the pixel array. A CCD array is read as 
follows: each pixel repeatedly transfers its charge into a neighboring pixel, so that the charge from 
any given pixel eventually travels a taxicab path to the ADC. Charges from different pixels are 
never combined, so there are a total of G(n 3 ) charge transfers. Since the ADC only digitizes one 
pixel at a time, it also takes @(n 2 ) time to read the whole array. In addition, each charge transfer 
leaves a fraction e of the electrons behind, where 1 — e equals the charge transfer efficiency. The 
electrons in the farthest pixels undergo 0(n) charge transfers, and in practice it is costly to achieve 
e < 1CT 5 , which puts a bound on the maximum size of a CCD array [Hol98, Fos93]. Even if future 
technology were to allow a better charge transfer efficiency, it is worth noting that each charge 
transfer uses a large constant amount of power, and that the total number of charge transfers is 
super-linear in the number of pixels. 

On the other hand, CMOS devices have an ADC built into every pixel. This solves all of the 
problems noted above, and adds another important feature: random access reading. In other 
words, we can choose to read and digitize only a subset of the pixels, and in practice, that is done, 
saving power and subsequent digital processing costs [Lie02]. However, the ADCs take up valuable 
real estate, and reduce the percentage of the chip that is available to collect photons. CMOS devices 
also generate substantially more noise than CCDs, further reducing the signal to noise ratio [LitOl]. 

In practice, many consumer products such as cell phone cameras use CMOS, while scientific instru- 
ments use CCDs. Nevertheless, star trackers on small or low-power-budget satellites are starting 
to use CMOS, forgoing factors of 10 and higher in precision. We give the specification of a CCD 
tracker and a CMOS tracker in current (2011) production to illustrate the difference, as well as 
some of the numbers in highlighted in Section 3.1.1. The CT-602 Star Tracker has a CCD camera 
and is made by Ball Aerospace & Technologies. It uses 8-9W of power, weighs 5.5kg, has 6000 
stars in its database, an 8 x 8 degree field of view, 512 x 512 pixels, an attitude accuracy of .0008 
degrees, and updates 10 times a second [Bal]. Comtech AeroAstro's Miniature Star Tracker has 
a CMOS camera, uses < 2W of power, weighs .4-. 8 kg, has 1200 stars in its database, a 24 x 30 
degree field of view, and 1024 x 1280 pixels, but has an accuracy of only .03 degrees and updates 
only 2 times a second [Com] . 

3.2 Star tracker operation 

We now study the process of attitude determination in more detail. As noted in Section 1.2, we 
often have an approximate attitude and go directly from Step 1 to Step 4 below. 

0. Celestial intuitions Let the apparent mass (hereafter, mass) of a star be the number of photons 
from the star that reach our camera. The masses of the stars in the night sky follow a power-law 
distribution, with exponent —1.17 [Lie02]. 

The stars are essentially infinitely far away, and can be treated as point sources of light. In 
particular, if a star camera were perfectly focused, the light from any given star would land on 
exactly 1 pixel, and we would not be able to resolve the centroid of any star at higher than pixel 
resolution. Star camera lenses are therefore intentionally out of focus, so that the light from a star 
lands on multiple pixels, which we can then average to get a centroid with sub-pixel resolution. 
The blurring can be mimicked by convolving the image with a Gaussian of radius .5 pixels 4 [Lie02], 

4 Technically, we want to use an Airy function, not a Gaussian. But for a blur of radius .5 pixels a Gaussian is a 
good approximation. 



12 



after which most of a star's mass lands on 4-6 pixels. Note that this number is independent of the 
star's mass, the field of view, or the total number of pixels. 

Additionally, all the stars that are visible to star trackers (or humans) are a part of the Milky Way 
galaxy, which is shaped like a disk. This means that the stars are not distributed uniformly over 
the sky. The density of stars varies by a factor of 4 [LOA05]; Figure 3 in Section 3.4 gives some 
intuition. Most star trackers with a small field of view are not able to resolve the area perpendicular 
to the galaxy. 

1. Image acquisition The expected number of photons (i.e. mass) captured by a pixel is propor- 
tional to the area of the lens and the exposure time. Photons hit a pixel, and eject an electron 
with some constant probability via the photoelectric effect. The pixel is then read and digitized 
(Section 3.1.2). 

Almost all the noise is introduced at this step. The signal-dependent portion of the noise, shot 
noise, is due to random photon arrival times. This is best modelled by a Poisson distribution, 
which can be approximated as a Gaussian with a standard deviation equal to the square root of 
the expectation [Hol98]. For example, if a pixel expects to see 900 photons, it will see 900 photons 
with a standard deviation of y/900 = 30 photons. 

There is also a large per-pixel component of the noise, which depends on the hardware, temperature, 
and other factors. See [Hol98] for a summary. 

2. Centroiding We locate a set of stars S in the image, by finding pixels with a value several 
standard deviations above the mean. We then centroid each star, by either taking the weighted 
mean of a few neighboring pixels, or by doing a more complicated Gaussian fitting. For a bright 
star, this can give a centroid resolution of .1 pixels in each dimension, even under moderate noise. 

There are several errors that can occur here. If the threshold for being a star is low, we may get 
stars that are composed only of noise. If our catalog is incomplete (as it almost certainly will be), 
we may get stars in S that aren't in the catalog. Finally, radiation or dust can occasionally cause 
a pixel to have an arbitrary value, in which case it might get classified as a star. 

3. Star identification In this step, we match the stars in S to an onboard database. If S has 
zero or one stars, there is nothing to do, and we give up. Some star trackers may be able to 
make an identification when \S\ =2, but we assume that \S\ > 3. We explain the algorithm 
from one common star identification algorithm [Mor97], since almost all of them follow the same 
outline [SM09]. 

Preprocessing Depending on the size of the lens, the capabilities of the processor, and the expected 
signal to noise ratio, anywhere from a few dozen to tens of thousands of stars are selected from a 
catalog. The star tracker firmware is loaded with a data structure D that stores all pairs of stars 
that can appear in the same picture, along with their distances. 

In space We check subsets of three stars from S to see if they can form a valid triangle using edges 
from D. Once we find one, (and if |S| > 4), we try to find a fourth star from S such that all ( 2 ) 
distances are consistent with D, in which case we declare a match. We then use the 3-4 matched 
stars to determine an approximate attitude. 

Even with tens of thousands of stars, it is extremely unlikely that we find a match that is consistent 
with D that ends up being wrong. In other words, most quadrilaterals consistent with D are far 
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away from all other such quadrilaterals, and most quadrilaterals formed using erroneous stars match 
nothing. However, it is possible for there to be no noise in Steps 1-2 and still not have a match, 
due to catalog errors and omissions. 

4- Precise attitude computation We use the approximate attitude information and the onboard 
database to obtain the rough location of other stars in the picture. The attitude is then refined 
using the centroids of those stars as well. 

Note that with a CMOS imager, if we start with an approximate attitude, we can actually skip 
most of Step 1 as well, and only read and digitize the handful of pixels under the stars we want to 
centroid. 

3.3 Specialization of the general algorithm 

We specialize the theoretical algorithm presented in Section 2 to obtain a new algorithm for attitude 
determination, which we call Attitude Determination Under Analog Folding (ADUAF). The local 
geometric objects turn into stars, and the features we use are star centroid and mass. We use J'crt 
as our underlying error correcting code, rather than fjis (Section 2.2). 

The biggest change from the theoretical algorithm to the practical algorithm is that we assume the 
stars are randomly rather than adversarially distributed. This means we no longer need to compose 
our error correcting code with the hash family % p . Additionally, we no longer need to shift the 
grid G by the random vector v, as we do in Section 2.3. In fact, the notion of a grid is no longer 
needed at all, and we allow the decoded cell d l (Section 2.4) to be any w' x w' patch of the original 
image. 

Given the physical cost of splitting the signal, the column sparsity s is set to the smallest value our 
algorithm can tolerate, which is 2. This has the additional effect of turning the clustering process 
from Step 2 of Section 2.4 into a much simpler bipartite matching process. 

Finally, rather than just recovering the w' xw' patch d!, we recover the centroid of d l , or the centroid 
of the star on d l . We then run the recovered centroids through the star identification process of 
Section 3.2. The full algorithm is presented below. 

Measurements 

If pi and P2 are the primes used to construct the CRT code in Section 2.2.2, we find primes p\ 
such that p\ k, yjp~i. Note that we don't literally need p\ and p' 2 to be prime, as long as they 
are relatively prime, since the Chinese Remainder Theorem from Step 4 of the recovery algorithm 
applies for relatively prime numbers as well. We will use the word prime to refer to p\ and p' 2 even 
when they are just relatively prime. 

Thinking of the noisy incoming signal x' as an n-by-n image, we define z l to be a p^-by-p^ image 
with 

ci=ji (mod p' t ) 
C2=?2 (mod p' % ) 

We say we fold x' to obtain z l . Figure 1 provides some intuition. 
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Figure 1: Taking measurements with column sparsity 2. Picture idea from [UGN + 09]. 



We actually think of each z % as a torus, and identify the top/bottom and left/right edges together, 
allowing recovery of stars that are on the edge of the folded picture. Since the stars were assumed 
to be randomly distributed in x', they are randomly distributed within each z % as well. Also, one 
could define the "measurement vector" z to be a 1-dimensional representation of the pair (z 1 ,^ 2 ), 
and construct a measurement matrix A accordingly. However, it is more useful to think of each z % 
as a 2-dimensional image. 



Recovery algorithm 

We follow the presentation of the corresponding paragraph in Section 2.4. For concreteness, we use 
a few specific numbers from our software implementation in Section 3.4. 

1. In each z l , we identify ten 3 x 3 (w x w) cells with high mass such that no two cells collide in 
more than four pixels. We allow the 3x3 cells to wrap around the edges of z l , as in a torus. 

2. In place of the /c-centering algorithm, we greedily choose up to eight pairs of cells (c 1 , c 2 ) from 
(z 1 , z 2 ) such that the feature vectors F{c l ) and F(c 2 ) are close. In our case, F(c) is a triple 
of values: the mass of c, along with two coordinates for the centroid of c. 

3. No longer needed. Each pair from Step 2 corresponds to one of the u l . 

4. In place of the error correcting code, we simply apply the Chinese Remainder Theorem in 
each dimension to recover a 3 x 3 region of the original image for each pair from Step 2. 

We compute centroids of the recovered 3x3 regions, by averaging the centroids (weighted by mass) 
of the corresponding 3x3 regions of z l and z 2 . This is now S in Step 3 of Section 3.2. 

Step 1 above takes time linear in the number of measurements, or time p\+p2 ~ n. Steps 2-4 take 
time at most quadratic in the number of cells chosen in Step 1 (ten, in our example). 



Potential hardware 

We have identified several existing hardware architectures that implement the folding mechanism 
above. In our case, we would use s copies (s = 2 above) of the folding hardware, where s is the 
column sparsity. We call each of the stacked squares in Figure 1 a piece of the image. We denote 
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the number of pieces by r (r = 9 in both z l and z 2 in Figure 1). After being folded, each piece of 
the image lands on the focal plane, which has an array of z % pixels. 

The most plausible architecture uses mirrors to directly reflect the pieces of the image onto the focal 
plane. The single-pixel camera [DDT + 08] implements a generalized version of this for z % = 1 using 
n moving mirrors. We would use r rigid mirrors for each z l , which is a substantial simplification 
of what they implemented. 

Another possibility is to use r different lenses, each of which focuses one piece of the image onto 
the focal plane [UGN + 09]. The challenge with this design will be making all the lenses face the 
same direction. 

The remaining two architectures are designed for unrelated earth-based tasks, and are more waste- 
ful of the signal than is appropriate for a star tracker. Nevertheless, they show the extent to 
which researchers are already thinking about hardware designs for folding. Both of the designs 
use beamsplitters and mirrors to combine the signal, and lose a factor of r from the signal in the 
process. Figure 2 depicts an element from each design. To extend 2(a) to r pieces one would use 
logr different such image combiners, and to extend 2(b) to r pieces one would use \fr — 1 stacked 
beamsplitters plus a mirror to fold in one direction, and y/r — 1 stacked beamsplitters plus a mirror 
to fold in the other. 




beamsplitter 
mirror 



(a) From [UGN+09]. (b) From [TAN 10]. 



Figure 2: Two examples of how to combine pieces of an image using a mirror and beamsplitter. 
Dashed lines indicate lost signal. 

Finally, it is possible we could split the signal and fold it after it hits the focal plane but before 
it gets digitized, which would not save on sensor costs, but would reduce the number of ADC 
computations, and reduce the load on the onboard processors. CCD images query an entire row 
of the image before digitizing [LitOl], so folding in at least the first dimension could be relatively 
easy. CMOS imagers are already built directly onto integrated circuits, so additional circuitry for 
folding is likely to be cheap [RGC + 10]. Several (non-satellite) CMOS imagers have already been 
built that use dense compressive sensing matrices to reduce the number of measurements that need 
to be digitized [RGC+10, MJS+10]. 

3.4 Software implementation 

We implement the algorithm presented in Section 3.3, and run experiments on simulated images. 
Source code is available at http://web.mit.edu/rishig/papers/local-geo/. 
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For simplicity, we project the stars onto a rectangular interval, rather than operating on a sphere 
(Figure 3). We index the rectangle by right ascension (longitude) a £ [— vr,7r] and declination 
(latitude) 5 G [— 7r/2, 7r/2]; for example, 5 = is the equator, and 5 = ir/2 is the north pole. So 
that the approximation makes sense, we ignore the portion of the sky where \5\ > tt/2 — ir/8 (the 
dashed blue lines in Figure 3). This also has the effect of removing the portion of the sky that has 
the fewest stars, which some star trackers don't operate on anyway. We assume without loss of 
generality that the camera is axis-aligned. 

Brightest 40000 stars in the Night Sky 
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Figure 3: Mercator projection of the night sky. The dense omega-shaped region is the central disk 
of the galaxy. We test our algorithm on the area between the dashed blue lines. 



We fix n = 800 (N = 640000) for all experiments. We expose the camera to a .08 radian by .08 
radian (4.6 by 4.6 degree) patch of the sky, which means that a typical image will have 3 to 10 
bright stars (10%-ile to 90%-ile) and 50 to 150 stars that act as background noise. Above, we 
defined the mass (apparent mass) of a star to be the number of photons from the star that hit 
our camera. In our pictures, if the median total star mass is scaled to 1, the 10%-ile mass is .6, 
and the 90%-ile mass is 2.25. These numbers were determined empirically from the Smithsonian 
Astrophysical Observatory (SAO) Star Catalog [Smi]. Recall the mass of the j th brightest star in 
the sky is OCT 1 ' 17 ) [Lie02]. 

We choose the stars for the preprocessed database D (Step 3 of Section 3.2) as follows. We extract 
a subset SAO' from the full SAO Catalog, by taking the 10 most massive stars in every ball of 
radius .08 radians. SAO' has 17100 stars, compared to 259000 stars in the SAO catalog. 

To generate test images, we randomly select axis-aligned patches of the sky from the area between 
the blue lines of Figure 3, and use the SAO star catalog to simulate star positions and mass. We 
convolve the stars with a Gaussian of radius .5 pixels, which causes the vast majority of a star's 
mass to fall within a 3 x 3 box. We then apply Poisson noise to account for photon arrival times, and 
fold the image using two relatively prime numbers. We add Gaussian noise (amount varying by ex- 
periment) to each pixel of both folded images to account for all signal-independent sources of noise. 
To keep the recovery process simple, we do not account for the Poisson noise introduced by splitting 
the signal; in other words, we apply Poisson noise once to each star, whereas in a hardware imple- 
mentation it is likely that Poisson noise will be applied independently to each folded copy of a star. 



17 



Figure 4 has a few pictures to help build intuition about star density and what stars look like. It 
is generally possible to see where the biggest stars are located, though some fraction of them get 
occluded by small stars or noise. 




200 400 600 800 

(a) Underlying noiseless signal x (zoom in). 




5 10 15 20 25 

(b) p = 29, No Noise. 




5 10 15 20 25 

(c) p' = 29, Noise = 150 



Figure 4: log(mass) in sample images from a representative part of the sky. The legend on the 
right applies to all three images. We cut off pixels with values below before taking the log in (c). 



We run our algorithm, ADUAF, as well as Sequential Sparse Matching Pursuit (SSMP) on 159 
images of the sky. SSMP includes a linear (in N) time sparse binary compressive sensing recovery 
algorithm, with recovery quality on par with other known methods [BI09]. We use the same folded 
images, or equivalently, the same measurement matrix, for both algorithms. 

ADUAF recovers a list of centroids, and we find and centroid stars in the image recovered from 
SSMP. We run the star identification algorithm from Step 3 of Section 3.2 on the outputs of both 
of them, and declare success if they identify the stars correctly. We report our results as a function 
of the standard deviation of the added Gaussian noise (Figure 5). 

The first observation we make is that ADUAF works very well down to an almost minimal number 
of measurements. The product v'xp' 2 has to be greater than 800, and the minimal set of primes is 26 
and 31. As the number of measurements increases, SSMP catches up and surpasses ADUAF, but 
we note that running SSMP (implemented in C) takes 2.4 seconds per trial on a 2.3 GHz laptop, 
while ADUAF (implemented in Octave/Matlab) takes .03 seconds per trial. Computation power on 
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Std of error per measurement 



Figure 5: Experimental results. Each point on the figure is computed using the same 159 underlying 
images. 

a satellite is substantially lower than that of a low end laptop, and given that the entire acquisition 
has to happen in .1 to 2 seconds, it seems unlikely that any algorithm linear or near linear in N 
is going to be practical. Finally, we note that the plot lines for both SSMP and ADUAF could be 
improved by a more sophisticated implementation of the star identification algorithm. 

Acknowledgements 

The authors would like to thank Tye Brady and Ben Lane from Draper Laboratory for numerous 
conversations and for help with the data. We would also like to thank the anonymous reviewers 
for their thorough reviews and for helping clarify the presentation. 

This research has been supported in part by a David and Lucille Packard Fellowship, MADALGO 
(Center for Massive Data Algorithmics, funded by the Danish National Research Association) and 
NSF grant CCF-0728645. R. Gupta has been supported in part by a Draper Laboratory Fellowship. 
E. Price has been supported in part by an NSF Graduate Research Fellowship. 



19 



A The Algorithm in Pictures 



Measurements We first compute Ax' from the received signal x' . An element of the (N, 
independent code Gcrt = fcRT Hp is depicted below. 



Received signal x' = x + fi 
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There are a total of N = n 2 pixels. The goal 
is to recover the k objects (colored polygons). 
Each object fits in a w x w pixel box. 
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Impose a (randomly shifted) grid G of cells of width 
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w/a. For clarity we no longer draw the pixel grid. 



Apply a pairwise independent hash function such as 
Hp : x — >■ ax + b (mod P) to a numbering of the cells. 
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Measured signal z = Ax' 
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Bi\> q buckets in row z % 



Apply an error correcting code 
/ that maps each cell onto ex- 
actly one bucket in every row. 
Sum the cells mapping onto each 
bucket. The code shown to the 
left is fcRT, where each cell is 
mapped to its index modulo vari- 
ous (relatively) prime numbers. 
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Recovery We now recover from the measurements Ax' . 



F(z|) 



(2,3,1.1) 
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H4) = (8, 14,9.5) 



Compute the feature vector F{z % -) of each bucket. 
In our experiments, the feature vector contains 
information about mass and centroid. 



F(4) = (8,15,9.2) 
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Set i? = {(i,i) : ||F(*j)|| r is large}. 
Discard buckets not in R. 
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Cluster F(R) = {F(zf} : £ R} into k clusters 
(with outliers). This induces a partition R' ,R} . . . R k 
of R, with F(R l ) equal to the l-th cluster. 

r 1 = 2 = 3 = □ 



Decode each i? to obtain a cell ef in the original image. 
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Though we don't elaborate in the text, a simple min 
or median process can be used to obtain an approxi- 
mation for the contents of each d l . 
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